Skip to content

Validate synthetic ID format on inbound header and cookie values#508

Merged
prk-Jr merged 11 commits intomainfrom
validate-synthetic-id-format
Mar 30, 2026
Merged

Validate synthetic ID format on inbound header and cookie values#508
prk-Jr merged 11 commits intomainfrom
validate-synthetic-id-format

Conversation

@prk-Jr
Copy link
Copy Markdown
Collaborator

@prk-Jr prk-Jr commented Mar 16, 2026

Summary

  • Inbound synthetic IDs from the x-synthetic-id header and synthetic_id cookie were accepted without validation, allowing injection of arbitrary strings into response headers, cookies, and third-party API calls
  • Adds a private is_valid_synthetic_id() validator (64 lowercase hex + . + 6 alphanumeric) with an O(1) length check first to bound all downstream work; invalid values are silently discarded and a fresh ID is generated in their place
  • Hardens logging by ensuring the raw ID value never appears in warn logs and demoting synthetic ID values from INFO to DEBUG to avoid recording pseudonymous identifiers in production log pipelines

Changes

File Change
crates/common/src/synthetic.rs Add is_valid_synthetic_id() production validator; validate in get_synthetic_id(); add debug_assert! in generator; demote ID values to debug!; new rejection + fallthrough tests
crates/common/src/test_support.rs Add shared VALID_SYNTHETIC_ID constant
crates/common/src/proxy.rs Update test fixture to use VALID_SYNTHETIC_ID
crates/common/src/integrations/registry.rs Update test fixture to use valid-format cookie value
docs/guide/synthetic-ids.md Document inbound validation behaviour and clarify HMAC determinism
CHANGELOG.md Add ### Security entry under [Unreleased]

Closes

Closes #412

Test plan

  • cargo test --workspace
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo fmt --all -- --check
  • JS format: cd crates/js/lib && npm run format
  • Docs format: cd docs && npm run format

Checklist

  • Changes follow CLAUDE.md conventions
  • No unwrap() in production code — use expect("should ...")
  • Uses tracing macros (not println!)
  • New code has tests
  • No secrets or credentials committed

Inbound synthetic IDs from the x-synthetic-id header and synthetic_id
cookie were accepted without validation. An attacker could inject
arbitrary strings — including very long values, special characters, or
newlines — which were then set as response headers, cookies, and
forwarded to third-party APIs.

Adds a private is_valid_synthetic_id() validator enforcing the canonical
format (64 lowercase hex chars + '.' + 6 alphanumeric chars). The length
check is O(1) and runs first to bound all downstream work. Invalid values
are silently discarded and a fresh ID is generated in their place; the
raw value is never written to logs.

Also adds a debug_assert! in generate_synthetic_id() to catch any future
regression in the generator, moves VALID_SYNTHETIC_ID to test_support so
it is shared across all test modules, and demotes synthetic ID values from
INFO to DEBUG in log output to avoid recording pseudonymous identifiers in
production log pipelines.

Closes #412
@prk-Jr prk-Jr self-assigned this Mar 16, 2026
Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below

@ChristianPavilonis ChristianPavilonis dismissed their stale review March 16, 2026 19:47

Re-posting as inline comments

Copy link
Copy Markdown
Collaborator

@ChristianPavilonis ChristianPavilonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-executed security hardening. Validation logic is correct and thoroughly tested. Approving with one suggestion inline.

Strengths:

  • O(1) length check gates all downstream work — good defense against oversized input on an edge server
  • Lowercase hex enforcement prevents intermediary case-normalization from creating fake-valid IDs
  • debug_assert! in the generator ensures validator and generator never drift apart
  • Logging hygiene — invalid values logged with len={} only, never the raw attacker-controlled string
  • Excellent test coverage with 8+ new/updated test cases
  • Shared VALID_SYNTHETIC_ID constant eliminates test fragility across 3 modules
  • CHANGELOG and docs properly updated

Note: The lol_html bump (e295f3a) in Cargo.toml/Cargo.lock appears to be a merge artifact from main — unrelated to the synthetic ID validation. Not a problem, just flagging it.

Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Well-executed security hardening PR that validates synthetic ID format on inbound header and cookie values. The validation logic is correct and test coverage is thorough. One gap in the revocation path needs attention.

Blocking

🔧 wrench

  • Revocation path bypasses validation: existing_ssc_cookie in publisher.rs:379-387 is read from the raw cookie jar before get_or_generate_synthetic_id runs validation. This raw, unvalidated value is then: (1) logged at log::info! level (line 381) — potentially logging attacker-controlled content, and (2) passed to delete_consent_from_kv as a KV store key (line 387) — allowing a crafted cookie value to target arbitrary KV keys for deletion. This is the same class of injection this PR aims to fix. Suggested fix: make is_valid_synthetic_id pub(crate) and validate existing_ssc_cookie before use, or restructure so the already-validated synthetic ID is reused for revocation. At minimum, track as a follow-up issue.

Non-blocking

🏕 camp site

  • Clarify old vs new validation layers (cookies.rs:38): The old synthetic_id_has_only_allowed_chars (permits [a-zA-Z0-9._-]) remains for outbound cookie sanitization, while the new is_valid_synthetic_id in synthetic.rs is strictly tighter for inbound validation. A brief comment noting this layered relationship would help future readers.

⛏ nitpick

  • PR body references stale crates/common/src/ paths: actual paths are crates/trusted-server-core/src/...
  • PR checklist says "tracing macros": project uses log crate, not tracing

CI Status

  • fmt: PASS
  • clippy: PASS
  • rust tests: PASS
  • js tests: PASS

@prk-Jr prk-Jr requested a review from aram356 March 25, 2026 07:42
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Clean, well-designed security fix that validates inbound synthetic IDs from the x-synthetic-id header and synthetic_id cookie. The is_valid_synthetic_id() validator is correct, defense-in-depth layering across synthetic.rs, cookies.rs, and publisher.rs is sound, and test coverage is thorough.

Non-blocking

📌 out of scope

  • Synthetic ID still logged at INFO in proxy.rs redirect handler: The redirect handler at proxy.rs:717 still logs the validated synthetic ID value at log::info!. Low risk since the value is already validated, but inconsistent with the PR's stated goal of demoting ID values to DEBUG. Worth a follow-up.

🤔 thinking

  • split_once('.') vs old split('.') is subtly correct: The new approach naturally handles multi-dot values because the total length constraint (71 = 64 + 1 + 6) means extra dots fail the alphanumeric suffix check. The dot_in_suffix test covers this. Sound design choice.

👍 praise

  • Excellent defense-in-depth in publisher.rs revocation path (publisher.rs:382-399): Always expires the cookie regardless of validity, only passes validated IDs to delete_consent_from_kv, logs only the length of invalid values. Prevents malicious cookie values being used as KV keys.
  • Synthetic ID values removed from log output (synthetic.rs): Old code logged actual ID values at trace level; new code logs only structural information. Eliminates log-exfiltration of pseudonymous identifiers.
  • O(1) length guard before character scanning (synthetic.rs:42): Constant-time length check before iteration is a nice DoS mitigation for oversized inputs.
  • Comprehensive test coverage: Covers invalid header/cookie rejection, header precedence, fallthrough behavior, generation on invalid input, and edge cases (uppercase hex, oversized input, empty string, dot in suffix, extra segments, missing suffix, non-hex, non-alphanumeric suffix).

⛏ nitpick

  • CHANGELOG says "64-hex-hmac" but validator requires lowercase hex (CHANGELOG.md:12): Saying "64-lowercase-hex-hmac" would be more precise. Minor since the code is authoritative.

CI Status

  • fmt: PASS
  • clippy: PASS
  • rust tests: PASS
  • js tests: PASS

@prk-Jr prk-Jr merged commit 8d942ee into main Mar 30, 2026
13 checks passed
@prk-Jr prk-Jr deleted the validate-synthetic-id-format branch March 30, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No synthetic ID format validation on inbound values

3 participants